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In this paper tlie approacli to solving several combinatorial optimization prob- 
lems using the local search and the genetic algorithm techniques is proposed. Ini- 
tially this approach was developed in purpose to overcome some difficulties inhibiting 
the application of above mentioned techniques to the problems of the Questionnaire 
Theory. But when the algorithms were developed it became clear that them could 
^ . be successfully applied also to the Minimum Set Cover, the 0-1-Knapsack and 

' probably to other combinatorial optimization problems. 
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' 1 Introduction 

^' 

O ' 1.1 High-level overview of the proposed approach 

o ■ 

The Optimal Binary Questionnaire problem is AAP-hard |AC94j . In search of an 
efficient approximate algorithm several approaches were investigated and the special 
^ ' efforts were dedicated to the local search |AB90j . |AB91j . 

H : 

^ However all attempts to develop a neighborhood function for binary questionnaires have 

led only to a very limited success and a connected neighborhood has been found only 
for a tiny class of questionnaires having rather theoretical importance |Bon03] . 

In this paper we propose to shift the research focus from the search within a set of ques- 
tionnaires to the search within a set of functions of special kind. Such functions allow 
construction of the a questionnaire by consequential choice of questions for each subordi- 
nate problem starting from the root one. In this paper the set of root question selection 
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functions (RQSFs) effectively interconnected by the natural neighborhood relation is 
proposed. 

After the implementation of the proposed algorithms it became clear that them could 
also be applied to some other combinatorial optimization problems, including Minimum 
Set Cover, 0-1-Knapsack and probably to other ones. The background of this idea 
is given by the reductions of the mentioned combinatorial optimization problems to 
some questionnaire optimization problems used in the proofs of A/''P-hardness and AfV- 
completeness of the questionnaire theory problems. 

We expect that this approach could be successfully extended to many combinatorial 
optimization problem for which the efficient local search neighborhood and effective 
genetic operators haven't been found yet or do not exists at all. 

In the remaining part of the paper we will show first how the proposed approach works 
for questionnaire optimization problems. Then we will discuss briefly results of the 
laboratory testing of the developed algorithm. Having this done we will show how these 
algorithms could be applied to other combinatorial optimization problems. 

1.2 The questionnaire theory basic definitions 

As it was stated above the mathematical model of binary questionnaire plays the central 
role in the presented approach and because this is not a widely known mathematical 
theory we will give here a brief introduction to it. From more details we suggest [Pic72t 

ips8nm)3] . 

One of central tasks of the discrete search theory is the task of building of optimal in 
some sense conditional search strategy, i.e. the search strategy in which the choice of 
any test depends on outcomes of previously applied ones. 

One possible classification of discrete search problems is based on principles according 
to which test sets are formed. E.g. for the construction of binary tree [Huf52trSob60j one 
can chose any possible subdivisions of a search area. And for binary search tree |HT7H 
IGW77j only tests preserving the linear order defined on the search area are allowed. 

Both Optimal Binary Tree and Optimal Binary Search Tree problems can be 
generalized within this classification in the following natural way. Let's consider the 
problem of construction of conditional search strategy from a limited set of tests given 
by an explicit enumeration. The example of such problem is presented in the table [T] 
and the one possible search strategy is given on the figure [H Problems of this type are 
subject of the Questionnaire Theory (QT) |Pic72llP58n IAC89] . 

Two types of tests (called questions) are considered in the theory of questionnaires. 
Each question of the first type defines subdivision of a search area into independent 
classes. Outcomes of each question of the second type can have nonempty intersections 
and covers the search area. In the first case the questionnaire can be represented by a 
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t 


Outcomes 


1 





{yi, 2/2, 2/3, y4, ys}, i:{y6, y?, ys, yg} 


2 





{yi, y2, ys, y4}, i:{y5, ye, y?, ys, yg} 


3 





{yi, y2, ys, ye, y?, ys}, i:{y3, y4, yg} 


4 





{yi, ys, ys, ye, y?, yg}, i:{y2, y4, ys} 


5 





{yi, y2, ys, y4, ys, ye, ys, yg}, i:{y7} 



Table 1: Example problem Ai 



{yi,y2,y3,y4,ysi 
{yi,y2,ys,y4} 



ye,y7,y8,yg} 



yi ys y2 y4 



ye y? 




Figure 1: Example of arborescence for the task in the tabled] 

rooted tree. Questionnaires of the second type are represented by acyclic graphs with a 
single source vertex. Picard |Pic72| called questionnaires of the first type arborescences 
and questionnaires of the second type latticoids. In this paper we will consider only 
arborescent questionnaires. Example of an arborescent quesitionnaire is given in the 
table [J and on the figure [H example of an latticoid questionnaire is given in the table [2] 
and on the figure [2j 

Application of the question within the questionnaire breaks the problem table into several 
tables, one per outcome of the question. Thus for binary question there will be 2 'sub- 
tables'. Each derived table is formed as a column subset of the basic problem table with 
the outcome number in the row represented the 'asked' question equal to the outcome 
number. Thus '0-subproblem' table of the question ti will contain all columns with '0' 
in the i-th position. 

Questions containing a single outcome are called senseless. In particular senseless ques- 
tions can be found in problem tables obtained after application of some question. Sense- 
less questions are removed from problem tables. 

A number of possible outcomes of the question called the question base. We will consider 
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Figure 2: Example of latticoid questionnaire for the task in the table [2] 
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t 


Outcomes 


1 


0:{yi, ys, 2/4, y5,y6,y7, ys}, i:{y2, y4, ys, ye, yg} 


2 


0:{yi, y2, ys, y4, ys, ye, yg}, i:{y3, y4, yr, ys} 


3 


0:{yi, ys, y4, ys, ye, y?}, i:{y2, ys, y4, y?, ys, yg} 


4 


0:{y3, ye, yg}, i:{yi, y2, y4, ys, y?, ys} 


5 


0:{yi, y2, ys, y?, ys, yg}, i:{y4, ys, ye, yr, ys, yg} 


6 


0:{yi, y2, ys, ys, ye, ys, yg}, i:{yi, y2, ys, y4, ye, y?, yg} 


7 


0:{y2, ys, ye, y?, ys, yg}, i:{yi, y2, ys, y4, ys, ye, ys, yg} 



Table 2: ^2 



in this paper only binary questions, i.e. questions of the base 2. Questionnaires built 
from binary questions are respectively called binary. 

Search area Y is considered traditionally as a set of independent events yj with a given 
discrete distribution p{yj) = pj- The convenient representation of set of questions T is 
given by a table where each row represents one question and the number on intersection 
of row i and column j represents the outcome of question i which event yj belongs to. 
Table [3] contains the same set of questions as the table [1] does but represented in the 
manner we just described. 
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c{t) : y 
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3,00 
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7,00 
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5,00 
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1 











1 
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6,00 
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p(y) 


0,05 


0,10 


0,05 


0,30 


0,20 


0,05 


0,05 


0,15 


0,05 



Table 3: ^3 



Another aspect the questionnaire theory extends the traditional discrete search models 
in is the cost of individual tests. While for optimization of binary trees and binary search 
trees we traditionally take 1 as a cost of each test, the questionnaire theory allows to 
define the cost function on the set of questions c(tj) = Cj G M. The sum of costs of 
questions applied in the current questionnaire to identify some particular event yj is 
called cost of identification of yj. The mean value of cost of identification of events 
from the search area Y for the given questionnaire Q is called cost of questionnaire 
C{Q) = X]y ey '^(Ui) where c(yj) is the sum of cost of questions applied to identify yj in 
Q. 
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E.g. the cost of questionnaire on picture [T] is: 



C{Q) =(ci + C2 + C3 + Ci){yi + p{y2) + p{yi) + p{y4))+ 
(ci + 02X2/5)+ 

(Cl +C3 + C4 + C5)(p(y6) +P(y7))+ (1) 
(Cl + C3 + C4)p(y8) + 

(ci + csMyg) = 18,45 

Multiply questionnaires of different cost can be constructed for each individual problem 
and in this paper we will consider the problem of building of optimal questionnaire in 
the sense of the defined cost function. 

We call the task of constructing an optimal not weighted (with all questions' cost equal 1) 
questionnaire Optimal Questionnaire (OQ) or Optimal Binary Questionnaire 
(OBQ) for the binary case. Weighted versions of these tasks will be called, respec- 
tively Optimal Weighted Questionnaire (OWQ) and Optimal Weighted Bi- 
nary Questionnaire (OWBQ). Ah these problems wih be called the problems of the 
theory of questionnaires. 

We call an individual problem of the theory of questionnaires logically complete if any 
pair of events is separated at least by one question. Otherwise obviously it is impossible 
to construct a questionnaire that identifies all the events. 

Obviously, for logical completeness any questionnaire theory problem it is necessary 
and sufficient that any pair of columns in its table of questions differs in at least one 
position. Further in this paper we always assume the logical completeness of considered 
problems. 

1.3 Complexity and approximability 

Statement 1 
OBQ is A/'P-hard. 

Proof. We will present the reduction from Minimum Set Cover (MSG) |GJ90| to 
OBQ. Let M be the individual MSG with the universe U and the family S of subsets 
of U represented as a binary n x A;-table, where n = |C/|, k = \S\ and the intersection 
of column i and row j contain '1' if Uj G Si and '0' otherwise. We assume that the 
described table doesn't contain similar columns. Otherwise we can combine each subset 
of similar columns together without loss of generality. 

We will construct the table representation of the derived individual OBQ by adding 
to the table of the individual MSG a column yo consisting of zeroes. Since the added 
column will differ from any column in the original MSG table and this original table also 
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doesn't contain similar columns as we discussed the derived OBQ table will be logically 
complete and thus will allow a construction of a complete questionnaire. 

Let assign probabilities to the events of the derived OBQ table in the following way: 



We will show now that if Q is the optimal questionnaire for the derived OBQ, then 
the set of subsets of U corresponding to the questions constituting the branch of the 
Q spanning the root and the event uq will represent the minimal cover for the original 



It is straightforward to show that the constructed set represents a cover for the original 
individual MSG problem. Now we will show that it is the minimal cover. 

Let assume the opposite, that is there exists the cover T' for the original MSG problem, 
such that \T'\ < \T\. Then it is possible to build the questionnaire Q' which will identify 
event yo using \T'\ questions corresponding to the elements of T' . Indeed since the 
elements of T' cover all the elements of U, for each event Ui G {yi, yn} there exists an 
element of S belonging to the T' which contains '1' in the i-th positions and therefore 
which distinguishes yi from y^. The rest of the questionnaire is not important, it is 
important only that we can build a complete questionnaire since the problem table 
logically complete. 

The cost of the Q' will satisfy the following inequality: C{Q') < \T'\pQ + nke 

The cost of Q is C{Q) = \T\ pQ+le where / - is the sum of lengths of branches spanning the 
root of Q with the events ei, e„. According to our assumption \T'\ < \T\ and therefore 
\T'\ -\T\>1 and according to the rule O + 1) < 1 and finally: C{Q) - C{Q') = 
Pq{\T'\ — \T\) + le — nke > — nke = 1 — ne — nke = 1 — nk{e + 1) > and thus Q is 
not optimal. Gontradiction.D 

Statement 2 

OQ, OWQ, OWBQ are MV-hard 

Proof. OQ, OWQ, OWBQ are all generahzations of OBQ.D 

Since all significant problems of the questionnaire theory are TVP-hard the task of de- 
velopment of efficient approximate algorithms becomes very important. But before dis- 
cussing the proposed algorithm we will check to what extent the OBQ and OWBQ are 
approximable. It will let us set the proper goals regarding the quality of the developed 
algorithms. 

Feige [Fei98| showed that MSG cannot be approximated in polynomial time within a 
factor of (1 — 0(l))lnn. unless AfV has quasi-polynomial time algorithm, i.e. unless 




MSG. 
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MV C DTiME(n^°siogn^_ and Safra [RS97J established a lower bond of clnn, where 
c is a constant under the weaker assumption V ^ MV. 

Since we can choose e as small as necessary and thus make the difference between the 
size of optimal cover and the cost of the corresponding questionnaire as small as we 
wish, the results about the MSG inapproximability will also apply to OBQ and OWBQ. 
In the rest of this paper we will show how a local search and genetic algorithms can be 
applied to problems of the questionnaire theory and to some extent we will generalize the 
proposed approach for the Minimum Set Cover and 0-1-Knapsack problems. 

Significant efforts were spent on attempts to apply the local search approach to the 
OWBQ |AB91[ EEan] . Unfortunately the described in these papers algorithm is only 
applicable to the very specific case. In this case OWBQ forms a matroid and as a 
result the exact solution could be obtained through using of greedy algorithm [Bon03) . 
These difficulties is a consequence of a relatively high internal complexity of the binary 
questionnaire model which makes it actually impossible to develop efficient neighborhood 
operators for the local search method as well as to develop a correct and efficient crossover 
and mutation operators for the genetic algorithm implementation. 

2 Local Search 

2.1 Simple greedy strategies 

We will start the construction of the proposed algorithm from the investigation of char- 
acteristics of simple greedy strategies. Several elementary greedy root question selection 
functions (RQSF) are represented in the tabled These functions allow the construction 
of a questionnaire in the top-down manner by consecutive choise of the root question for 
the produced on the previous steps subordinate problems. 

The numerous laboratory test has shown that in the most part of cases the function 
No. 4 demonstrates the best performance among all. However in the same time for some 
cases other functions can be more efficient. 

2.2 Composite strategy 

As it was mentioned different RQSFs can deliver better solutions for different individual 
OWBQ problems. So we can expect that this property will hold also for any set of 
subordinate problems of the given individual OWBQ problem. 

Keeping this property in mind we will split the set of all individual OWBQ problems into 
a finite set of classes and assign to each class some type of RQSF. Such composite RQSFs 
form a space with natural neighborhoud function based on replacement of elementary 
RQSF assigned to different classes of OWBQ problems in the composite RQSF. 
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We will extend the basic set of RQSFs in the tabled with some artificial functions which 
although cannot be considered as optimizing strategies themselves but which are very 
useful for overcoming of local extremums. We will discuss these functions with more 
details in the section 12.51 



No. 


/ 


Comments 


1 


arg min c(t) 


Question cost 


2 


arg max A H 


Maximal decrease of entropy 


3 


arg max 1^ 


Maximal decrease of entropy to cost 


4 


argmm(p^y) + ^^^^J^), where Ps{t) - is 
the sum of probabilities of outcome s 
for t 


Question preference function 



Table 4: Greedy functions 



2.3 Decomposition of the space of subordinate problems 



For the partition of the set of individual OWBQ problems into classes, we will choose 
some characteristic function that maps a set of individual problems into M. Tabl^S] 
contains the potential candidates for the role of such characteristic function. 



No. 


/ 


fmax 


fmin 


Comments 


1 


H = - X;"^o Pi log2 Pi 


log2 n 





Entropy 


2 


n 
r 


2'- 
r 


r 

2^ 


Compactness 


3 


Hc = - Y.]=Q c^- ^052 c^-, where c^- 
- is the 'discrete' cost 'distribu- 
tion', e.g. Y.'j=Q c'j = 1 


log2r 





Entropy of cost 'distribu- 
tion' 



Table 5: Characteristic functions 



Based on laboratory testing it was revealed that among all the functions presented in 
the table the entropy H{T) allows to obtain the most uniform distribution of values for 
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the subordinate problems of a given individual OWBQ problem in most cases and we 
shall use H{T) in the proposed algorithm. 

To split a set of individual OWBQ problems into classes, we need to break the range 
of the selected characteristic function into a finite number of intervals. Each of the 
selected intervals will induce a corresponding class of equivalence on the set of individual 
problems. 

An obvious approach is to choose a certain number of intervals of equal length to be 
determined depending on the size of the problem. However, this approach leaves some 
room for improvement. 

Uneven distribution of values of the characteristic function between the intervals can lead 
to the situation when the part of the induced classes will contain several subordinate 
problems each, and the other ones can remain empty. As a result, the flexibility of 
combined function will decline. 

An attempt to compensate for this shortcoming by increasing the number of intervals 
will result in increased complexity of the algorithm. The solution is to use a set of 
intervals of variable size, such that each subordinate problem corresponds to exactly one 
interval. 

The number of subordinate problems of the given individual OWBQ problem is equal to 
the number of vertices of the arbitrary questionnaire of this problem, and thus is equal 
to n — 1. We will choose the boundaries between the intervals in the middle between 
adjacent pairs of values of an ordered sequence of values of the characteristic function 
calculated for the set of subordinate problems defined by the current questionnaire. In 
other words, the system of intervals will be dynamic and will depend on the current 
questionnaire, or to be more precise, on the set of subordinate problems defined by the 
current questionnaire. 

Obviously, the changes of the questionnaire, carried out at each step of local search, 
will also affect the set of subordinate problems, and as a consequence, at each step the 
system of intervals will require adjustment. 

2.4 The algorithm 

To represent the composite RQSF we will use the tabic containing two rows. The first row 
of this table will contain the (upper) boundaries of intervals of values of the characteristic 
function that is used to split the set of subordinate problems of the resolved OWBQ 
problem into subsets. The second row will contain the type values of the elementary 
RQSFs assigned for appropriate intervals. 

As it was described in the previous section we will use the n — 1 intervals with variable 
boundaries chosen midway between adjacent pairs of elements of an ordered sequence of 
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values of the characteristic function calculated for the set of subordinate problems corre- 
sponding to the internal vertexes (to the questions) of the current questionnaire. 

On each iteration of local search the algorithm produces a neighborhood of the current 
solution sequentially changing the type of elementary root question selection function 
assigned to each interval. The system of intervals of the characteristic function is re- 
maining unchanged until the moment of choosing the cheapest neighbor. 

If the found solution is cheaper than current, it is designated as current one and the 
system of intervals is updated based on the set of subordinate problems generated by 
the questionnaire, constructed using the updated combined RQSF. Elementary RQSF 
type values are assigned to the newly created intervals in the manner preserving the 
RQSF types applied to the subordinate problems of the resolved individual OWBQ 
problem before the update of the interval system. Obviously, in this approach after the 
update intervals of the characteristic function, the current composite RQSF will generate 
the same questionnaire as before the update. 

Algorithm 2.1 Local search for OWBQ 

F - combined RQSF 
f = F[i] - RQSF i 

Q = F(T) - questionnaire which is outcome of 

the function F for individual problem T 
G - Set of elementary RQSFs 

*/ 

<Choose the initial combined function Fnew> 
do{ 

Fcurrent = Fnew; 
for(int i = 0; i < |Fc|; i++) 
foreachCf' : G) 

if(f' != Fcurrent [i]){ 
F' = Fcurrent; 
F' [i] = f ^ 

if (C (Fnew (T)) < C(F'(T))) 
Fnew = Fcurrent; 

} 

} until (C(Fnew(T)) < C (Fcurrent (T) ) ) 
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2.5 Analysis of the test results 

Results of testing of the algorithm 12.11 represented in the table [7] let conclude that the 
algorithm is quite efficient and produce mostly better or similar solutions then known 
approximation algorithms do. 

However in a substantial number of cases, the proposed algorithm did not improve the 
results of elementary methods. In addition to the above results about AA'P-hardness and 
nonapproximability of OWBQ/OBQ there is also another reason for the complexity of 
developing high-quality approximation algorithms for these problems. 

The space of solutions of a typical OWBQ problem contains a significant number of local 
extremums. This property of the questionnaire theory problems is a consequence of the 
tree structure of the questionnaire and the properties of the cost function allowing inde- 
pendent existence of multiple local extremums for different subtrees of a questionnaire 
both on independent branches and combined hierarchically. 

It should be noted that the neighborhood function used in the proposed algorithm can 
link entirely different questionnaires to each other. For example, if the elementary func- 
tion has been changed for the interval which contains the root problem, then begin- 
ning with the change of the root question, the construction of the questionnaire goes 
completely differently way. However, we still endeavored to make our algorithm more 
resistant to local extremums. 

To achieve this we will extend the neighborhood by expanding the set of RQSFs with 
some special 'dumb' functions Fk, returning the constant question number k each. In 
fact, the newly added functions will not be exactly the constant because with a decrease 
in the number of available questions during the gradual construction of a questionnaire 
part of questions become senseless and are removed from the problem table, so we need 
to ensure that any 'dumb' function returns values not exceeding the number of questions 
in the current problem table. 

To achieve this the 'dumb' functions will return a value of k mod n. The use of 'dumb' 
functions will, in fact, let the algorithm to do a step aside at each step thus trying to 
avoid a possible local extremum. The M ixed column of the table [7] presents the results 
of the algorithm 1 2 . 1 1 test s with the extended neighborhood. 

Additional evidence of the justification for the inclusion of the discussed above 'dumb' 
functions are the results of testing of algorithm 12 . 1 1 with 'dumb' functions only (see table 
[7]- column 'Dumb'). Despite the fact that the lack of greedy functions the method is 
slightly worse but nevertheless is quite effective. 

3 Genetic algorithms 

We will use tradition GA terminology, see eg. |Hol921 IBBM93al IBBM93b] 
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3.1 Representation of individuals 

Let us consider how the proposed approach can be used to develop a genetic algorithm 
(e.g. see |Hol92[ lBBM93al IBBM9"3b] ) for the OWBQ. The coding of solutions in the form 
of a linear chain significantly simplifies the development of genetic operators. However, 
solution model that we used in the algorithm 12. H has more complex structure. It includes 
the partition of the set of individual problems into classes induced by the set of intervals 
of some characteristic function together with the mapping reflecting these intervals to 
the set of subordinate problems defined by the questionnaire of this solution. 

Since we use intervals with variable boundaries, a simple substitution of some type 
value of elementary functions from one solution to another has little meaning since the 
function, type of which will be transferred, can be applied to the individual problems in 
the range of the characteristic function different than in the solution, from which it was 
borrowed. 

Therefore, in order to simplify the genetic operators we are forced to fallback to the 
solution representation with intervals of equal length. However, in order to avoid too 
uneven distribution of sub-problem among intervals, we will increase the amount of 
intervals. During the laboratory tests the different approach of selection of the number 
of intervals were investigated, but the most effective were the values between nr and 

3.2 Genetic operators 

Since we switched to the simplified representation of individual OWBQ problems, which 
now is equivalent to a linear string of values, development of genetic operators becomes 
a trivial task. 

To implement the crossover operator it is enough to break two genotype chains which 
we're going to cross over at a certain position and glue the pairs of obtained fragments 
from different chains together. During the laboratory tests some more complex opera- 
tors were checked out including 2-point and uniform crossover |BBM93a] . However the 
real impact from these modifications was insufficient and we have chosen the simplest 
approach. 

The mutation operator implemented is also fairly simple. The type value of an elemen- 
tary root question selection function in a randomly selected position solution is replaced 
by another randomly selected type. 

However, due to the redundancy of the set of intervals, the replacement of a single 
gene has very little impact and it was decided to increase the number of genes that are 
changed within the single mutation. Different methods of choice of the number of genes 
which are subject to mutation have been tested and the value of l/r, where I - the length 
of the genotype have been chosen as the most efficient one. 
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3.3 Choosing a strategy for the formation of generations 

There are different approaclies to tiie formation of generations in genetic algorithms. 
In one approach, each new generation has the same size as the entire population, in 
another one it represents only some part of a population. Sometimes all members of 
a new generation are included entirely in a population displacing the least adapted 
members of previous generations, sometimes the competition between new and previous 
generations is implemented. We have chosen the option of several generations with a 
competitive incorporation of new members (see Algorithm 13. ip . 

Another important aspect of the strategy is the method of selection of individuals for 
emphmating. A cost of the questionnaire doesn't represent a good fitness function 
because the relative difference in cost of different questionnaires is quite small and doesn't 
provide enough advantage for cheaper solutions during selection. The reason for this is 
very small relative differences between the costs of various questionnaires. To ensure 
effective selection and to help to prevent premature convergence, we will use as a fitness 
function the questionnaire cost, scaled as follows: = C{Q) — mmC{Q),Q € G, 

where G - is the new generation. 

3.4 Parameters of the algorithm 



The table [6] represents the key parameters of the algorithm 13.11 



Parameter name 


Description 


Mating Rate 


Average number of matings per individual 


Mutation Rate 


Probability of mutation 


Length of genotype 


Number of symbols in genotype 


Number of generations without 
improvements 


Parameter used in algorithm halt condition 


Maximal total number of gener- 
ations 


Parameter used in algorithm halt condition 


RQSF set 


See table SI and the 'dumb' functions 


Characteristic function 


See table [5] 



Table 6: Parameters of genetic algorithm 



3.5 Analysis of test results 

The main result of testing was the proof of the effectiveness of the Algorithm 13.11 For 
some part of the solutions the proposed algorithm provided better solutions (See table 
[7D than ones obtained with the help of basic greedy methods and by the help of the 
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Algorithm 3.1 Genetic Algoritlim for OWBQ 

improvement - difference between maximum fitness values 
of two consecutive generations 

*/ 

<prepare initial population> 
iterationNo = 0; 
generationNo = 0; 
do { 

for(int i = 0; i < populationSize * matingRate; i++)-[ 
<Choose male>; 
<Choose female>; 
<mate selected individuals> ; 
<apply mutation to offsprings>; 
<add off spins to population>; 

} 

while ( I population I > populationSize) 

<remove least fit individual>; 
generationNo++ ; 
if (improvement == 0.0) 

iterationNo++; 
else 

iterationNo = 0; 
}- while (iterationNo < generationsWithout Improvement 
&& generationNo < maxNumberOf Generations) ; 



algorithm 12.11 In many cases the algorithm founds solution of the same quality as one 
found with the basic greedy functions. 

4 Application of the developed method to other combina- 
torial optimization problems 

In this section we will discuss how the developed algorithm can be applied to the Min- 
imum Set Cover, to the Weighted Set Cover and to the 0-1-Knapsack prob- 
lems. 

4.1 Minimum Set Cover and Weighted Set Cover 

Any individual MSG problem can be reduced to the OBQ using the method described 
in the proof of statement 11.31 As well as any Weighted Set Cover problem can be 
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reduceded to OWBQ. The probability distribution of the obtained OBQ/OWBQ is quite 
specific and the entropy function loses its discriminative properties as a characteristic 
function. 

So we're switching to the Compactness function (see table [5]). For the Weighted Set 
Cover problem also the 'cost entropy' He function can be used. It is also worthwhile 
to modify the set of RQSFs. Then algorithms 12.11 and 13.11 are applicable without any 
changes. 

4.2 Combinatorial 0-1-knapsack 

We will show in this subsection how some modification of OWBQ can be used as a 
representation of 0-1-Knapsack problem in purpose to make algorithms 12.11 and 13.11 
applicable to these problems. 

Consider a modification of OWBQ with a limited maximum length of the branches. This 
problem wih be called the problem of the Limited Depth Questionnaire (LDQ) (see 
[ABlOj ). Let's get acquainted with this problem more. 

It is obvious that in general the construction of the questionnaire which will fully identify 
the set of events impossible under the condition of restricted depth. As a result of this 
limitation and due to the properties of the considered problem the notion of the cost 
of the questionnaire as a criterion of optimality becomes meaningless. Therefore, we 
propose a criterion that would reflect the degree of identification of the set L of events 
by the measured questionnaire. 

We assume that each element of the search area L is assigned the weight function d{yi), 
satisfying the axioms of measure: 

VLi C L : d{Li) > 1 

VLi C L : d{Li) = ^ Li = 

VLi, L2 C L : d(Li J L2) < d{Li) + D{L2) (3) 
VLi, L2 ^ L : d{Li J L2) = d{Li) + (/(Ls) ^ Li L2 = 
VLi, L2 C L : Li C L2 ^ d{Li) < d{L2) 

We shall call d{L*) the size of set of events L*. Let d{L) = 1. We will consider the 
case when all elements of the search area have the same size: Vy^ : d{yi) = 1/n. This 
approach reflects the situation when all events in L have equal importance from the 
identification perspective. 

A quantitative characterization of the degree of identification system L with respect to 
its partition into subsets Li, ...,Lk is given by: 
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k 



D{Li,...,Lk) = M{Li,...,Lk) = Y,d{Li) ^ p{yj) 



(4) 



1=1 Vj&Li 



Obviously the less the value of D the more the overall depth of identification is. Thus, we 
will strive to minimize the average size of the partition produced by the questionnaire 
under condition that the summary cost of questions asked along any branch of the 
questionnaire should not exceed some specified value c*. 

Statement 3 

LDQ is A/^P-complete. 

Proof. Let / - the individual 0-1-Knapsack problem. We will form the corresponding 
LDQ as follows. For each element of G / we will include the question tj, which is a 
single-event check, i.e.: 



We set the cost c(tj) = d{ei), where d{ei) - is the weight of element in the reduced 
individual 0-1-Knapsack problem /. Also we will put the probabilities of all events 
equal to each other. Suppose also c* = d* , where d* - the knapsack size in the problem 
/. 

Obviously the optimal in the sense of criterion |3] questionnaire for the derived individual 
LDQ will correspond to the optimal packing of knapsack in the problem /.□ 

Now, as in the case of the covering problems, we can apply algorithms 12.11 and 13.11 after 
replacing the characteristic function and after modifying the set of RQFSs. The only 
remaining step now - is to transform the solved individual 0-1-Knapsack problem to 
OWBQ as it was described in the proof of the statement 14.21 

We have to underline that in this case the algorithm 13.11 will require some minor changes. 
Since the number of questions in the LDQ can be less than n — 1 we will need some 
method to calculate the intervals' boundaries for the absent subordinal problems. This 
task can be accomplished e.g. by consequitive splitting of the largest existing interval 
into two equal ones until the reaching of necessary amount of boundaies. 

5 Test results 

The results of tests of the algorithms 12.11 and 13.11 for OWBQ are presented in table [71 
the legend for the header is below: 

Opt. - Cost of optimal questionnaire 



Vt, : \Ls{t^)\ = IkLsiti) \=n-l 



(5) 
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QPF - Question preference function (see tabled]) 
'Dumb' - Algorithm O with 'Dumb' RQSFs only 
Greed - Algorithm 12.11 with greedy RQSFs from table H] 
Mixed - Algorithm 12.11 with both 'Dumb'and greedy RQSFs 
GA - Algorithm O 

6 Conclusion 

The proposed approach has let us to develop local search and genetic algorithms which 
exceed all known approximation algorithms in quality. Because of its universality the 
developed algorithms can be applied in addition to various questionnaire optimization 
problems, also to the Minimum Set Cover and Weighted Set Cover problems, 
to the 0-1-Knapsack problem and probably to other combinatorial optimization prob- 
lems. LDQ and latticoids are two example of the flexibility of the mathematical model of 
the questionnaire which let us believe that many other combinatorial optimization prob- 
lems can be represented as questionnaires and thus can be solved using the proposed 
approach. 

All these problems are characterized by known difficulties in developing a neighborhood 
function for local search, as well as in the development of genetic operators. The reason 
for this situation is the specific structure of solutions of all these problems that do not 
allow efficient implementation of the necessary manipulations. The proposed method 
gives a relief for this problem. 

As local search algorithms and genetic algorithms are very highly adaptive tools and 
provide the necessary flexibility to be efficient tool in the resolving of different special 
cases of mentioned above common problems and in the heuristic search for solutions for 
specific individual problems. 
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Table 7: Test results 
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